home *** CD-ROM | disk | FTP | other *** search
-
-
-
- 253
-
- CHAPTER 23 - XLAT
-
-
- The 800 pound gorilla in the computer field is, of course, IBM.
- It can go its own way and other companies have to adjust to keep
- themselves in line with what IBM is doing.
-
- You have been using ASCII characters since the first time you
- used BASIC (or whatever your first high-level language was).
- Every character has a unique number which represents it.
-
- character ASCII encoding
-
- A 65d
- a 97d
- ? 63d
- 0 48d
-
- IBM has its own encoding for mainframe computers. It is called
- EBCDIC (pronounced ebb'-sih-dick).{1} It is a spinoff of the
- coding on punch cards. You remember punch cards? This coding is
- entirely different from ASCII. Here are some examples.
-
- character ASCII code EBCDIC code
-
- a 97d 129d
- ? 63d 111d
- 0 48d 240d
- H 72d 200d
- I 73d 201d
- J 74d 209d
- K 75d 210d
-
- You can see that there is no relationship between the two
- encodings. Also, notice that while the alphabet is a continuous
- section of ASCII coding, there are breaks in the EBCDIC code
- (I=201, J=209).
-
- All PCs use ASCII, so if we want to transfer text from a PC to an
- IBM mainframe computer, we need to change ASCII -> EBCDIC going
- to the mainframe and change EBCDIC -> ASCII coming from the
- mainframe. This is the responsibility of the communications
- program that runs the modem, so you will never have to do it
- yourself. Intel has provided an instruction to help the
- communications program do this translation. It is called XLAT.
-
- In order to use XLAT, you need a translation table. This is a 256
- byte array where each element of the array contains the result
- you want. Looking at the data above:
-
- ____________________
-
- 1. Which stands for Extended Binary Coded Decimal Interchange
- Code.
-
- ______________________
-
- The PC Assembler Tutor - Copyright (C) 1989 Chuck Nelson
-
-
-
-
- The PC Assembler Tutor 254
- ______________________
-
-
- CHARACTER ASCII TO EBCDIC TABLE EBCDIC TO ASCII TABLE
-
- a array1 [97] = 129 array2 [129] = 97
- ? array1 [63] = 111 array2 [111] = 63
- 0 array1 [48] = 240 array2 [240] = 48
- H array1 [72] = 200 array2 [200] = 72
- I array1 [73] = 201 array2 [201] = 73
- J array1 [74] = 209 array2 [209] = 74
- K array1 [75] = 210 array2 [210] = 75
-
- We have two different tables here. Array1 takes the ASCII
- encoding and gives back the EBCDIC encoding. Array2 takes the
- EBCDIC encoding and gives back the ASCII encoding. For each
- character, the appropriate table gives the correct translation
- from one encoding to another. All we need now is the translation
- instruction. Put the address of the translation table in BX. This
- table should be in the DS segment, but DS may be overriden:
-
- mov bx, offset ascii_to_ebcdic_table
-
- Put the character you want translated in al:
-
- mov al, character
-
- translate:
-
- xlat
-
- To translate a 20 byte string of ASCII data into EBCDIC, you
- might have the following code:
-
- ;----------
- mov di, offset ebcdic_string
- mov ax, seg ebcdic_string
- mov es, ax
-
- mov si, offset ascii_string
-
- mov bx, offset ascii_to_ebcdic_table
- mov cx, 20 ; translate 20 bytes
- cld ; clear DF (increment)
-
- translation_loop:
- lodsb ; ascii to al
- xlat ; translate
- stosb ; al to ebcdic
- loop translation_loop
- ; ----------
-
- Since this is ASCII to EBCDIC, if AL contained 63 before XLAT,
- then after XLAT AL would contain 111. If AL contained 73 before
- XLAT, then after XLAT it would contain 201. If AL contained 97
- before XLAT, after XLAT it would contain 129.
-
- If we wanted to go the other direction we would have to make the
- EBCDIC string the source string, make the ASCII string the
-
-
-
-
- Chapter 23 - Xlat 255
- _________________
-
- destination string, and use the other table:
-
- mov bx, offset ebcdic_to_ascii_table
-
- The rest of the code would be the same.
-
-
- Since this is done by the communications program, we won't
- concern ourselves with ASCII <-> EBCDIC any more, but we will use
- XLAT in two slightly different ways.
-
-
- First, let's categorize characters. Some things are Whitespace
- (that is, tabs, newlines, spaces, form feeds, etc.) Some
- characters are octal, decimal, punctuation, hex, etc. There is a
- pre-existing table called translation_table in the subdirectory
- XTRAFILE. Its pathname is \xtrafile\transtbl.obj. It has all 256
- ascii characters coded in the following way:
-
- WHITESPACE EQU 80h ; 1000 0000
- PUNCTUATION EQU 40h ; 0100 0000
- ALPHABETIC EQU 20h ; 0010 0000
- OCTAL EQU 10h ; 0001 0000
- DECIMAL EQU 08h ; 0000 1000
- HEX EQU 04h ; 0000 0100
- BOX_CHAR EQU 02h ; 0000 0010
- GREEK_CHAR EQU 01h ; 0000 0001
-
- If the character is whitespace, then the leftmost bit is set. If
- it is a greek character (ascii 224 - 239 on the PC) then the
- rightmost bit is set. If it is more than one thing, then the
- appropriate bits are set. For instance, '6' is octal, decimal and
- hex, so it's encoding is:
-
- '6' 0001 1100
-
- 'a' is both alphabetic and hex, so it's encoding is:
-
- 'a' 0010 0100
-
- The following program inputs a character, and finds out whether
- it is punctuation, a letter, etc. If it is none of the eight
- things, then the program prints that nothing was found. It is the
- same block of code over and over, so you might want to do only
- part, or you might want to cut it out with a word processor and
- insert it in the template file (don't forget to delete the page
- headers and page numbers).
-
-
- ; + + + + + + + + + + + + + + + START DATA BELOW THIS LINE
- EXTRN translation_table:BYTE ;\xtrafile\transtbl.obj
-
- whitespace_banner db "It is whitespace." , 0
- punctuation_banner db "It is punctuation." , 0
- alphabet_banner db "It is alphabetic." , 0
- octal_banner db "It is octal." , 0
- decimal_banner db "It is decimal." , 0
-
-
-
-
- The PC Assembler Tutor 256
- ______________________
-
- hex_banner db "It is hex." , 0
- drawing_banner db "It is a box drawing character." , 0
- greek_banner db "It is a Greek character." , 0
- nothing_banner db "No match was found." , 0
-
- dirty_flag db ?
- ; + + + + + + + + + + + + + + + END DATA ABOVE THIS LINE
-
- ; + + + + + + + + + + + + + + + START CODE BELOW THIS LINE
- WHITESPACE EQU 80h
- PUNCTUATION EQU 40h
- ALPHABETIC EQU 20h
- OCTAL EQU 10h
- DECIMAL EQU 08h
- HEX EQU 04h
- BOX_CHAR EQU 02h
- GREEK_CHAR EQU 01h
-
- ; set up the xlat table
- mov ax, seg translation_table
- mov es, ax
- mov bx, offset translation_table
-
- outer_loop:
- mov dirty_flag, 0 ; marker for success
- call get_ascii_byte ; input a byte to al
- xlat es:[bx] ; do the translation
-
- test al, WHITESPACE
- jz punct_check
- push ax ; save translation in al
- mov ax, offset whitespace_banner
- call print_string
- pop ax
- mov dirty_flag, 1 ; set the dirty flag
-
- punct_check:
- test al, PUNCTUATION
- jz alpha_check
- push ax ; save translation in al
- mov ax, offset punctuation_banner
- call print_string
- pop ax
- mov dirty_flag, 1 ; set the dirty flag
-
- alpha_check:
- test al, ALPHABETIC
- jz octal_check
- push ax ; save translation in al
- mov ax, offset alphabet_banner
- call print_string
- pop ax
- mov dirty_flag, 1 ; set the dirty flag
-
- octal_check:
- test al, OCTAL
- jz decimal_check
-
-
-
-
- Chapter 23 - Xlat 257
- _________________
-
- push ax ; save translation in al
- mov ax, offset octal_banner
- call print_string
- pop ax
- mov dirty_flag, 1 ; set the dirty flag
-
- decimal_check:
- test al, DECIMAL
- jz hex_check
- push ax ; save translation in al
- mov ax, offset decimal_banner
- call print_string
- pop ax
- mov dirty_flag, 1 ; set the dirty flag
-
- hex_check:
- test al, HEX
- jz drawing_check
- push ax ; save translation in al
- mov ax, offset hex_banner
- call print_string
- pop ax
- mov dirty_flag, 1 ; set the dirty flag
-
- drawing_check:
- test al, BOX_CHAR
- jz greek_check
- push ax ; save translation in al
- mov ax, offset drawing_banner
- call print_string
- pop ax
- mov dirty_flag, 1 ; set the dirty flag
-
- greek_check:
- test al, GREEK_CHAR
- jz nothing_check
- push ax ; save translation in al
- mov ax, offset greek_banner
- call print_string
- pop ax
- mov dirty_flag, 1 ; set the dirty flag
-
- nothing_check:
- cmp dirty_flag, 0 ; was anything found?
- je print_nothing_banner
- jmp outer_loop
- print_nothing_banner:
- mov ax, offset nothing_banner
- call print_string
- jmp outer_loop
-
- ; + + + + + + + + + + + + + + + END CODE ABOVE THIS LINE
-
- you need to:
-
- link prog1+transtbl+\asmhelp ;
-
-
-
-
-
- The PC Assembler Tutor 258
- ______________________
-
- The program is long, but straightforward. Input a character and
- get its encoding. Test for each characteristic. If it is found,
- print the appropriate message and set the dirty_flag to indicate
- something was printed. At the end, if nothing was printed, print
- the failure message.
-
- Notice that the translation table is in ES and we are using a
- segment override for it. If you look at the EXTRN statement for
- 'translation_table', you will see that even though we are using
- ES, it is declared EXTRN in a segment with an:
-
- ASSUME ds:DATASTUFF
-
- statement. How can we get away with this? The assembler never
- deals with 'translation table' directly. The only thing it does
- is put the offset in BX. We put the segment override in ourselves
- with:
-
- xlat es:[bx]
-
- so the assembler never has to decide whether a segment override
- is necessary or which segment override to use.
-
-
- WORD SEARCH
-
- When doing the mock word search program in the chapter on string
- instructions, I mentioned that it really wouldn't cut the mustard
- when it comes to real word searches. Why? If we are looking for
- "when" we also want to find "When". If we are looking for
- " searches ", we also want to find " searches,", that is,
- punctuation should not interefere unless we want it to, and
- capitals should not interefere unless we want them to. With the
- aid of a translation table, we will make a word search program
- which uses the following rules. In the SEARCH string (the string
- that defines what you are looking for):
-
- (1) Any small letter will match either a small or large
- letter.
- (2) A capital letter will match only a capital letter.
- (3) A blank will match any whitespace or punctuation.
- (4) A punctuation mark will only match itself.
-
- With these rules "Why" must start with a capital 'W' to be a
- match, but 'h' and 'y' may be either capital or small. " some,"
- may have any whitespace (including a carriage return) in front,
- but must hava a comma ',' at the end.
-
- This program has two data files. \XTRAFILE\SRCHTBL.OBJ contains
- the translation table. It is called "wordsearch_table" and is in
- DATASTUFF, so will be in our normal DS segment. In order to have
- text to search I have included an object file that is the text of
- a chapter from a book. (The object file text includes carriage
- returns). The text is a C string - it is terminated by a 0.
-
- The book was written by C.D. Huffam, and is the autobiographical
- account of his dual life as a writer and lecturer. The book is
-
-
-
-
- Chapter 23 - Xlat 259
- _________________
-
- called "A Tale of Two C.D.s". The object file with the text is
- \XTRAFILE\TWOTALE.OBJ. It is in a private segment and will use ES
- as a segment register. There is also a straight text file which
- you can print out so you can see what is in the object file. It
- is \XTRAFILE\TWOTALE.DOC.
-
- Here's the program. The explaination is at the end.
-
- ; + + + + + + + + + + + + + + + START DATA BELOW THIS LINE
- EXTRN tale_text:BYTE, wordsearch_table:BYTE
-
- entry_message db 13,10, "Enter a word for a word search", 0
- no_match_message db "There was no match", 0
- input_buffer db 80 dup (?)
- text_file_length dw ?
- letter_count dw ?
- ; + + + + + + + + + + + + + + + END DATA ABOVE THIS LINE
-
- ; + + + + + + + + + + + + + + + START CODE BELOW THIS LINE
-
- ; find the length of the text file
- mov ax, seg tale_text ; load es register
- mov es, ax
-
- mov di, offset tale_text ; offset to di
- mov bx, di ; copy to bx
- mov al, 0 ; try to match zero
- cld ; clear DF (increment)
-
- string_end_loop:
- scasb ; search for zero
- jne string_end_loop
-
- dec di ; one too many , so decrement
- sub di, bx ; finish - start = length
- mov text_file_length, di ; length of text_file
-
-
- big_loop:
- ; get a word for the word search
- mov ax, offset entry_message
- call print_string
- mov ax, offset input_buffer
- call get_string
-
- ; find the end of string
- mov al, 0 ; compare with 0
- mov bx, offset input_buffer
- mov cx, 0 ; letter count
- letter_count_loop:
- cmp al, [bx] ; compare to 0
- je end_of_count_loop
- inc cx ; increment count
- inc bx ; increment pointer
- jmp letter_count_loop
- end_of_count_loop:
- cmp cx, 0 ; if 0, string is empty
-
-
-
-
- The PC Assembler Tutor 260
- ______________________
-
- je big_loop ; so start again
- mov letter_count, cx
-
- ; look for word match. In this program, the text string
- ; is referenced by si and the search string is referenced
- ; by di.
-
- mov si, offset tale_text
- mov cx, text_file_length ; length of file
- sub cx, letter_count ; last possible match
- inc cx ; +1 for boundary condition
-
- ; set up translation table ( it is in DATASTUFF )
- mov bx, offset wordsearch_table
-
-
- word_search_loop:
- push si ; save a copy
- push cx ; save a copy
- mov di, offset input_buffer
- mov cx, letter_count
-
-
- letter_loop:
- mov al, es:[si] ; text to al
- cmp al, [di] ; same as search string?
- je next_letter
- xlat ; if not, translate
- cmp al, [di] ; allowable substitute?
- jne new_start ; if not, start at new place
- next_letter:
- inc di ; move to next letter
- inc si
- loop letter_loop
-
- ; we fell through, so we found a complete match
- jmp found_it
-
- ; no match. are we finished?
- new_start:
- pop cx
- pop si
- inc si ; move to next character
- loop word_search_loop
-
- ; we fell through. finished, but no match
- mov ax, offset no_match_message
- call print_string
- jmp big_loop
-
- found_it:
- pop cx ; take cx off the stack
- pop si ; start of the match
-
- ; move 25 characters to buffer for printing
- mov di, offset input_buffer
- mov cx, 25
-
-
-
-
- Chapter 23 - Xlat 261
- _________________
-
- character_move:
- mov al, es:[si]
- mov [di], al
- inc si ; increment pointers
- inc di
- loop character_move
-
- mov BYTE PTR [di], 0 ; end of string
- mov ax, offset input_buffer
- call print_string
- jmp big_loop
- ; + + + + + + + + + + + + + + + END CODE ABOVE THIS LINE
-
- You need to:
-
- link prog2+twotale+srchtbl+\asmhelp ;
-
- to get asmhelp and the two data files in the program.
-
- This program is very similar to the search program in the chapter
- on strings. However, because of where the files are, the pointers
- have been changed around. Therefore, it is safer if you simply
- cut out the program with a word processor and paste it into the
- template file rather than try to modify the prevoius search
- program.{2}
-
- It is assumed that you did the string match program. The logic is
- the same and will not be covered again. First we input a search
- string. Then starting at the beginning of the text to be search
- we check till we find the first match. If we find a match, we
- print out 25 characters starting with the first character of the
- match. If no match is found, a message to that effect is printed.
-
- The character match is a two step process. The character from the
- text is put in AL. It is compared with the search character for
- an EXACT match. If they match, we are done. If not, we use XLAT
- on AL (the character from the text) which will translate to its
- allowable substitute. In fact, all this is just: (1) all capital
- letters become small, (2) all punctuation becomes spaces, and (3)
- all whitespace becomes spaces. Once again, we compare AL with the
- search character. If we have a match, ok. If not, we start over.
-
- The text is in ES, the translation table is in DS, so it is
- inconvenient to use the string instructions in this program.
-
- Try to match a word at the beginning of the line, end of the
- line, with and without punctuation and with and without capitals.
- If you go across a line break, you need to substitute two blanks
- in the search string for CRLF (13,10).
-
-
- ____________________
-
- 2. You should understand what is going on in the code before
- you run these programs. I didn't write the code for myself, I
- wrote it for you. If you run it but don't understand it, it won't
- help you a bit.
-
-
-
-
- The PC Assembler Tutor 262
- ______________________
-
- Suppose you are not interested in all 256 values of the
- translation table. Let's say that you only want to have a
- translation table for the numbers from 0 to 99. Can you still use
- this? Yes, but you need to put in some range checking to make
- sure that you have valid data.
-
- MAX_VALUE EQU 99
-
- mov al, data_byte ; byte to al
- cmp al, MAX_VALUE ; too large?
- ja data_error ; report error
- xlat
-
- This insures that any data that is out of range is not
- translated. Therefore the translation table only needs to be 100
- bytes long (0 - 99).
-
- If you want more than 256 elements in the translation table you
- need to use words, not bytes, and you cannot use XLAT. You can
- make your own code to do the same thing.
-
- MAX_VALUE EQU 999
- my_translation_table dw 1000 dup (?)
-
- if you put the translation data into the table, you can then have
- the following code:
-
- mov bx, offset my_translation_table
-
- ; - - - - - translation block
- mov si, data_word ; word to si
- cmp si, MAX_VALUE ; too large?
- ja data_error
- shl si, 1 ; SI x 2 = number of BYTES into table
- mov ax, [bx+si] ; base + offset
- ; - - - - - end of translation block
-
- XLAT is about twice as fast as this last code, so when you have a
- choice always use XLAT.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Chapter 23 - Xlat 263
- _________________
-
- SUMMARY
-
-
- XLAT
-
- BX holds the address of a 256 byte array called a
- translation table. AL holds the character to be translated.
- If x is the value in AL before XLAT, then after XLAT,
- AL=array[x].
-
-